Extensive Simulations for Longest Common SubsequencesFinite

نویسنده

  • J. Boutet
چکیده

Given two strings X and Y of N and M characters respectively, the Longest Common Sub-sequence (LCS) Problem asks for the longest sequence of (non-contiguous) matches between X and Y. Let LN be the length of a LCS of two random strings of size N. Using extensive Monte Carlo simulations for this problem, we nd a nite size scaling law of the form E(LN)=N = S + AS=(ln N p N) + :::, where S and AS are constants depending on S, the alphabet size. We provide precise estimates of S for 2 S 15. We also study the related Bernoulli Matching model where the diierent entries of the \strings" are matched independently with probability 1=S. Let L B NM be the length of a longest sequence of matches in this case, for a given instance of size N M. On the basis of a cavity-like analysis we nd B S (r) = (2 p rS ? r ? 1)=(S ? 1), where B S (r) is the limit of E(L B NM)=N as N ! 1, the ratio r = M=N being xed. This formula agrees very well with our numerical computations of E(L B NM). It provides also a very good approximation for S(r), the corresponding function of the random string model, the approximation getting better as S increases. We nally study the \ground state" properties of this problem. We nd that the number NLCS of solutions typically grows exponentially with N. In other words, this system does not satisfy \Nernst's principle". This is also reeected at the level of the overlap between two LCSs chosen at random, which is found to be self averaging and to aproach a deenite value qS < 1 as N ! 1.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extensive Simulations for Longest Common Subsequences

Given two strings X and Y of N and M characters respectively, the Longest Common Sub-sequence LCS Problem asks for the longest sequence of non-contiguous matches between X and Y. Let LN be the length of a LCS of two random strings of size N. Using extensive Monte Carlo simulations for this problem, we nd a nite size scaling law of the form ELN=N = S + AS=ln N p N + :::, where S and AS are const...

متن کامل

Extensive Simulations for Longest Common Subsequences Finite Size Scaling, a Cavity Solution, and Connguration Space Properties

Given two strings X and Y of N and M characters respectively, the Longest Common Sub-sequence LCS Problem asks for the longest sequence of non-contiguous matches between X and Y. Let LN be the length of a LCS of two random strings of size N. Using extensive Monte Carlo simulations for this problem, we nd a nite size scaling law of the form ELN=N = S + AS=ln N p N + :::, where S and AS are const...

متن کامل

An Effective Branch-and-Bound Algorithm to Solve the k-Longest Common Subsequence Problem

In this paper, we study the Longest Common Subsequence problem of multiple sequences. Because the problem is NPhard, we devise an effective Branch-and-Bound algorithm to solve the problem. Results of extensive computational experiments show our method to be effective not only on randomly generated benchmark instances, but also on real-world protein sequence instances.

متن کامل

Hardness of Longest Common Subsequence for Sequences with Bounded Run-Lengths

The longest common subsequence (LCS) problem is a classic and well-studied problem in computer science with extensive applications in diverse areas ranging from spelling error corrections to molecular biology. This paper focuses on LCS for fixed alphabet size and fixed runlengths (i.e., maximum number of consecutive occurrences of the same symbol). We show that LCS is NP-complete even when rest...

متن کامل

Similarity Search on Uncertain Spatio-temporal Data

In this work, we address the problem of similarity search in a database of uncertain spatio-temporal objects. Each object is defined by a set of observations ((time,location)-tuples) and a Markov chain which describes the objects uncertain motion in space and time. To model similarity which is an important building block for many applications such as identifying frequent motion patterns or traj...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998